† Corresponding author. E-mail:
An understanding of protein folding/unfolding processes has important implications for all biological processes, including protein degradation, protein translocation, aging, and diseases. All-atom molecular dynamics (MD) simulations are uniquely suitable for it because of their atomic level resolution and accuracy. However, limited by computational capabilities, nowadays even for small and fast-folding proteins, all-atom MD simulations of protein folding still presents a great challenge. An alternative way is to study unfolding process using MD simulations at high temperature. High temperature provides more energy to overcome energetic barriers to unfolding, and information obtained from studying unfolding can shed light on the mechanism of folding. In the present study, a 1000-ns MD simulation at high temperature (500 K) was performed to investigate the unfolding process of a small protein, chicken villin headpiece (HP-35). To infer the folding mechanism, a Markov state model was also built from our simulation, which maps out six macrostates during the folding/unfolding process as well as critical transitions between them, revealing the folding mechanism unambiguously.
Despite decades of active research, protein folding remains one of the most important unsolved problems in molecular biology. An understanding of protein folding/unfolding has important implications for all biological processes, including protein degradation, protein translocation, aging, and diseases.[1,2] Recently, there has been substantial insight regarding the knowledge of native states as well as the stable intermediate states of several proteins.[3–6] However, less information is available about the path followed by a protein to fold to the native state and the characteristics of intermediates along the way. These intermediates are mostly short-lived and thus very difficult to trace.[7]
All-atom molecular dynamics (MD) simulations are uniquely suitable for the study of protein folding because of their atomic level resolution and accuracy. However, even for small and fast-folding proteins, all-atom MD simulations of protein folding still presents a great challenge, because the timescale of protein folding with all-atom MD simulations will exceed the computational capabilities of modern computers. The timescale of protein folding is on the order of 1–106 ms,[8] and the currently known fastest folding proteins fold over
An alternative way to overcome the challenge above is to study unfolding process using MD simulations at high temperature. High temperature provides more energy to overcome energetic barriers to unfolding. Moreover, there are numerous advantages to study unfolding rather than folding. Such simulations begin from a well-defined starting point, a crystal or NMR structure, which improves the odds of sampling experimentally relevant regions of conformational space, while simulations from an arbitrary extended structure present too many conformational possibilities, and the search problem becomes insurmountable. Furthermore, the system is less likely to become trapped in a local minimum during unfolding, which is common in attempts to simulate the folding reaction. Another advantage of studying unfolding is that the full reaction coordinate from the native to denatured states can be explored. Since the mechanism of folding can be probed from both directions, information obtained from studying unfolding can be shed light on the mechanism of folding.[1,15]
Markov state models (MSMs) are a powerful approach to map out conformational transitions as well as relevant conformational states. Rather than solely based on geometric criteria, MSMs are kinetic network models that can be used to identify metastable states and calculate their equilibrium thermodynamics and kinetics.[16–20] The models focus on metastable regions of phase space and partition conformational space into a number of metastable states such that intra-state transitions are fast but inter-state transitions are slow. This separation of timescales ensures that an MSM is Markovian and allows an MSM built from short simulations to model long timescale events.[21] Totally, MSMs can provide a reduced view of the ensemble of spontaneous fluctuations that the molecule undergoes at equilibrium, as well as the population and transition rates between key conformational states,[22–24] which has been demonstrated by many recent studies of protein folding[25,26] and protein dynamics.[27]
35-residue villin headpiece subdomain (HP-35, PDB entry 1vii)[28] is one of the smallest proteins that can fold autonomously. It contains only natural amino acids without any disulfide bonds, oligomerization, or ligand binding for stabilization.[29] Although it is very small, HP-35 comprises three α-helices (Fig.
In the present study, a 1000-ns MD simulation at high temperature (500 K) was performed to investigate the unfolding process of HP-35. From its native state, fully extended conformations were obtained by our 500-K MD simulation. Then, based on a MSM, the conformational dynamics during the unfolding process is modeled as transitions between kinetically metastable states, giving a reduced kinetic network between multiple conformational states of the protein. We revealed the key intermediate states along the unfolding pathway as well as the time scale associated with this process. This study reveals intermediate states as well as transition pathways during the unfolding process of HP-35 and hence greatly enhances our understanding of the folding/unfolding mechanism of the protein.
The simulation was performed using the PMEMD simulation module for graphics processing units (pmemd.cuda) in the MD program package AMBER14[39–41] and the AMBER ff14SB force field.[42] The initial coordinates for the MD simulation were taken from the NMR structure of HP-35 (PDB entry 1vii (Fig.
DSSP program[47] were used to measure the change in the secondary structure of the protein during its unfolding process. The secondary structure types were assigned based on H-bonding patterns: n-turn with an H- bond between the CO of residue i and the NH of residue
The program MSMBuilder3[48] was employed to build the MSM of the conformational landscape of HP-35 in four steps. (i) Each frame of the MD trajectory was transformed into a vector composed by the dihedral angles phi and psi, by which different conformations can be discriminated. (ii) Using time-structure independent components analysis (tICA),[49] the conformational space was reduced to six dimensions. On the basis of the tICA projections, the conformations were clustered into 1000 states (microstates) using the mini-batch k-medoids clustering algorithm. The conformations in the same state are geometrically so similar that one can reasonably assume that they can interconvert much faster within the state than between states. (iii) A transition matrix was constructed between these microstates at a proper lag time, in which the implied time scales are converged so that the microstate model can be demonstrated as Markovian. The free energy of each microstate was also generated as
Using the cpptraj module of AmberTools,[39] PCA was performed on the
The percentage vi of the total variance contained in a given eigenvector Li is given by
Since our simulation was run at high temperature (500 K), the root-mean- square deviation (RMSD) increase very rapidly at the early stage of the MD simulation: from 0 to 8 Å in less than 2 ns (see Fig.
As shown in Fig.
The extent of folding/unfolding is best measured by the intactness of the hydrophobic core. Here, the intactness of the hydrophobic core is represented by three pairwise distances of the atoms that compose the hydrophobic core: the distance between the HZ atom of PHE7 and the HB2 atom of PHE18, the distance between the HE1 atom of PHE11 and the HB3 atom of LEU29, as well as the distance between the HZ atom of PHE18 and the HG3 atom of LYS30 (marked as 1, 2, and 3 in Fig.
To elucidate the change in the secondary structure, all types of secondary structure elements were extracted from our simulation trajectory data, which is shown in Fig.
As mentioned above, HP-35 consists only three helices with turn and bend structures between them. As shown in Fig.
To elucidate the conformational changes of HP-35, a kinetic network MSM was built from our simulation data, shown in Fig.
As illustrated in Fig.
Our MSM provides novel insights into the conformational changes of HP-35 during the unfolding process. In these six metastable states, major conformational changes are detected in the three helix structures, which agrees well with the secondary structure analysis (Figs.
Figure
All
The vector field representation of the first three PCs obtained from the MD simulation as well as the corresponding eigenvalues is shown in Fig.
Limited by computational capabilities, nowadays even for small and fast-folding proteins, all-atom MD simulations of protein folding still presents a great challenge. However, there is an alternative way: studying unfolding process using MD simulations at high temperature. High temperature provides more energy to overcome energetic barriers to unfolding. More importantly, based on the previous investigation,[1] the mechanisms of protein folding can be reflected by unfolding process. However, although there is agreement between the unfolding simulations and experiments that investigate both folding and unfolding, it is acknowledged that sampling in MD is limited and that unfolding may not always be the same as folding. Some folding intermediates may disappear at high temperatures and some forbidden pathways may make contributions to the unfolding and refolding of the protein. For example, previous investigation shows that the higher- temperature simulations failed to capture the desolvation barrier involving the packing of the hydrophobic core that was observed at lower temperatures.
In the present study, a long time (1000 ns) MD simulation at high temperature (500 K) was performed to investigate the unfolding process of HP-35. To infer the folding mechanism, a MSM was also built from our simulation, which gives a reduced kinetic network between multiple conformational states of the protein. In our simulation, six macrostates coexist and transitions between these states constantly occur. Among these states, the highest-flux pathway of HP-35 folding is the state
[1] | |
[2] | |
[3] | |
[4] | |
[5] | |
[6] | |
[7] | |
[8] | |
[9] | |
[10] | |
[11] | |
[12] | |
[13] | |
[14] | |
[15] | |
[16] | |
[17] | |
[18] | |
[19] | |
[20] | |
[21] | |
[22] | |
[23] | |
[24] | |
[25] | |
[26] | |
[27] | |
[28] | |
[29] | |
[30] | |
[31] | |
[32] | |
[33] | |
[34] | |
[35] | |
[36] | |
[37] | |
[38] | |
[39] | |
[40] | |
[41] | |
[42] | |
[43] | |
[44] | |
[45] | |
[46] | |
[47] | |
48 | |
[49] | |
[50] | |
[51] | |
[52] | |
[53] | |
[54] | |
[55] | |
[56] | |
[57] | |
[58] | |
[59] | |
[60] |